45 research outputs found
Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting
Given the success of Douglas--Rachford splitting (DRS), it is natural to ask
whether DRS can be generalized. Are there other 2 operator resolvent-splittings
sharing the favorable properties of DRS? Can DRS be generalized to 3 operators?
This work presents the answers: no and no. In a certain sense, DRS is the
unique 2 operator resolvent-splitting, and generalizing DRS to 3 operators is
impossible without lifting, where lifting roughly corresponds to enlarging the
problem size. The impossibility result further raises a question. How much
lifting is necessary to generalize DRS to 3 operators? This work presents the
answer by providing a novel 3 operator resolvent-splitting with provably
minimal lifting that directly generalizes DRS.Comment: Published in Mathematical Programmin
Linear Convergence of Cyclic SAGA
In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant
of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of
differentiable convex functions by cyclically accessing their gradients. Even
though the theory of stochastic algorithms is more mature than that of cyclic
counterparts in general, practitioners often prefer cyclic algorithms. We prove
C-SAGA converges linearly under the standard assumptions. Then, we compare the
rate of convergence with the full gradient method, (stochastic) SAGA, and
incremental aggregated gradient (IAG), theoretically and experimentally.Comment: Published in Optimization Letter
Proximal-Proximal-Gradient Method
In this paper, we present the proximal-proximal-gradient method (PPG), a
novel optimization method that is simple to implement and simple to
parallelize. PPG generalizes the proximal-gradient method and ADMM and is
applicable to minimization problems written as a sum of many differentiable and
many non-differentiable convex functions. The non-differentiable functions can
be coupled. We furthermore present a related stochastic variation, which we
call stochastic PPG (S-PPG). S-PPG can be interpreted as a generalization of
Finito and MISO over to the sum of many coupled non-differentiable convex
functions. We present many applications that can benefit from PPG and S-PPG and
prove convergence for both methods. A key strength of PPG and S-PPG is,
compared to existing methods, its ability to directly handle a large sum of
non-differentiable non-separable functions with a constant stepsize independent
of the number of functions. Such non-diminishing stepsizes allows them to be
fast
Adaptive Importance Sampling via Stochastic Convex Programming
We show that the variance of the Monte Carlo estimator that is importance
sampled from an exponential family is a convex function of the natural
parameter of the distribution. With this insight, we propose an adaptive
importance sampling algorithm that simultaneously improves the choice of
sampling distribution while accumulating a Monte Carlo estimate. Exploiting
convexity, we prove that the method's unbiased estimator has variance that is
asymptotically optimal over the exponential family
ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs
Despite remarkable empirical success, the training dynamics of generative
adversarial networks (GAN), which involves solving a minimax game using
stochastic gradients, is still poorly understood. In this work, we analyze
last-iterate convergence of simultaneous gradient descent (simGD) and its
variants under the assumption of convex-concavity, guided by a continuous-time
analysis with differential equations. First, we show that simGD, as is,
converges with stochastic sub-gradients under strict convexity in the primal
variable. Second, we generalize optimistic simGD to accommodate an optimism
rate separate from the learning rate and show its convergence with full
gradients. Finally, we present anchored simGD, a new method, and show
convergence with stochastic subgradients
A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs
In this paper, we present a method for identifying infeasible, unbounded, and
pathological conic programs based on Douglas-Rachford splitting, or
equivalently ADMM. When an optimization program is infeasible, unbounded, or
pathological, the iterates of Douglas-Rachford splitting diverge. Somewhat
surprisingly, such divergent iterates still provide useful information, which
our method uses for identification. In addition, for strongly infeasible
problems the method produces a separating hyperplane and informs the user on
how to minimally modify the given problem to achieve strong feasibility. As a
first-order method, the proposed algorithm relies on simple subroutines, and
therefore is simple to implement and has low per-iteration cost
Risk-Constrained Kelly Gambling
We consider the classic Kelly gambling problem with general distribution of
outcomes, and an additional risk constraint that limits the probability of a
drawdown of wealth to a given undesirable level. We develop a bound on the
drawdown probability; using this bound instead of the original risk constraint
yields a convex optimization problem that guarantees the drawdown risk
constraint holds. Numerical experiments show that our bound on drawdown
probability is reasonably close to the actual drawdown risk, as computed by
Monte Carlo simulation. Our method is parametrized by a single parameter that
has a natural interpretation as a risk-aversion parameter, allowing us to
systematically trade off asymptotic growth rate and drawdown risk. Simulations
show that this method yields bets that out perform fractional-Kelly bets for
the same drawdown risk level or growth rate. Finally, we show that a natural
quadratic approximation of our convex problem is closely connected to the
classical mean-variance Markowitz portfolio selection problem
Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET
Many imaging problems, such as total variation reconstruction of X-ray
computed tomography (CT) and positron-emission tomography (PET), are solved via
a convex optimization problem with near-circulant, but not actually circulant,
linear systems. The popular methods to solve these problems, alternating
direction method of multipliers (ADMM) and primal-dual hybrid gradient (PDHG),
do not directly utilize this structure. Consequently, ADMM requires a costly
matrix inversion as a subroutine, and PDHG takes too many iterations to
converge. In this paper, we present near-circulant splitting (NCS), a novel
splitting method that leverages the near-circulant structure. We show that NCS
can converge with an iteration count close to that of ADMM, while paying a
computational cost per iteration close to that of PDHG. Through experiments on
a CUDA GPU, we empirically validate the theory and demonstrate that NCS can
effectively utilize the parallel computing capabilities of CUDA.Comment: Published in SIAM Journal on Scientific Computin
Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications
In many applications such as color image processing, data has more than one
piece of information associated with each spatial coordinate, and in such cases
the classical optimal mass transport (OMT) must be generalized to handle
vector-valued or matrix-valued densities. In this paper, we discuss the vector
and matrix optimal mass transport and present three contributions. We first
present a rigorous mathematical formulation for these setups and provide
analytical results including existence of solutions and strong duality. Next,
we present a simple, scalable, and parallelizable methods to solve the vector
and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA
GPU and present experiments and applications.Comment: 22 pages, 5 figures, 3 table
Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications
In many applications such as color image processing, data has more than one
piece of information associated with each spatial coordinate, and in such cases
the classical optimal mass transport (OMT) must be generalized to handle
vector-valued or matrix-valued densities. In this paper, we discuss the vector
and matrix optimal mass transport and present three contributions. We first
present a rigorous mathematical formulation for these setups and provide
analytical results including existence of solutions and strong duality. Next,
we present a simple, scalable, and parallelizable methods to solve the vector
and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA
GPU and present experiments and applications.Comment: 22 pages, 5 figures, 3 table